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ABSTRACT 



A computer- implemented method and apparatus for process- 
ing a spoken request from a user to control an automobile 
device. A speech recognizer recognizes a user's speech input 
and a speech understanding module determines semantic 
components of the speech input. A dialogue manager deter- 
mines insufficiency in the input speech, and also provides 
the user with information about a device in response to the 
input speech. 

18 Claims, 4 Drawing Sheets 
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METHOD FOR NATURAL DIALOG 
INTERFACE TO CAR DEVICES 

BACKGROUND AND SUMMARY OF THE 
INVENTION 

The present invention relates generally to an automobile 
device controller and, more particularly, to an apparatus and 
method for using natural dialog to control operation of an 
automobile system, such as a navigation system. 

In the field of operator controlled automobile systems and 
devices, the increasing use of technology has resulted in 
several useful, systems. For example, global positioning 
systems (GPS) in combination with road atlases stored in a 
database on the vehicle provide an intelligent navigation 
system for directing the driver. As another example, car 
audio systems integrate radio receivers, cassette tape decks, 
and single or multiple-disk compact disk players into a 
single system that often includes several modes of operation. 
Regardless of the vehicle system, such complex systems are 
generally operated by push button, remote control, or 
on-screen displays. Operation of such systems distract the 
vehicle operator from devoting full attention and concen- 
tration to safely operating the vehicle. 

The present invention is directed to an apparatus for 
providing a natural dialog interface for a device installed on 
an automobile. The interface includes a speech recognizer, 
the speech recognizer recognizes input speech provided by 
a user. A speech understanding module connects to the 
speech recognizer. The speech understanding module deter- 
mines semantic components of the input speech. A dialog 
manager connects to the speech understanding module. The 
dialog manager determines a condition of insufficient 
semantic information existing within the input speech based 
upon the determined semantic components and provides 
information to the user about the device in response to the 
input speech. 

For a more complete understanding of the invention, its 
objects and advantages, reference should be made to the 
following specification and to the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of a natural dialog interface 
arranged in accordance with the principles of the present 
invention; 

FIG. 2 is a block diagram depicting the components of 
natural language parser of FIG. 1; and 

FIGS. 3a-3b are flow charts depicting the operation of the 
natural dialog interface. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENT 

A presently preferred embodiment of the natural language 
interface 10 arranged in accordance with the principles of 
the present invention is illustrated in FIG. 1, Input speech 
from the user 8 is supplied through a suitable audio interface 
and digitizer for input to speech recognizer 12. The output 
of speech recognizer 12 is supplied to a natural language 
parser 14. 

Natural language parser 14 works in conjunction with a 
set of grammars 16 that define the semantics of what natural 
language parser 14 can understand. The details of the parser 
are discussed more fully below. Essentially, however, the 
parser operates on a goal-oriented basis identifying key 
words and phrases from the recognized speech and using 
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those recognized words and phrases to fill slots in pre- 
defined templates or frames that represent different goal- 
oriented tasks. Natural language parser 14 also works in 
conjunction with a semantic representation of the automo- 

5 bile device modes and commands 18 of the devices con- 
trolled by natural language interface 10. The device modes 
and commands in each device are stored in grammars 16. 
Natural language parser 14 thus consults the semantic rep- 
resentation of device modes and commands 18 to determine 
what responses to present to the user and what control 
commands to output to device controllers as will be 
described herein. 

Because natural language interface 10 enables the user to 
interact with devices, the number of which, their modes, and 
commands may be constantly changing, a mechanism 

15 enables downloading the latest devices, modes, and com- 
mands into grammars 16 of natural language parser 14. This 
function is performed by mapping module 20. Mapping 
module 20 downloads electronic device, mode, and com- 
mand information from one or a number of context modules 

20 22, 24, 26. Context modules 22, 24, 26 provide device mode 
and command information to mapping module 20 to facili- 
tate identification of key words and phrases by natural 
language parser 14. 
The subject invention will be described with particular 

25 respect to natural language interface 10 operating a naviga- 
tion system and an audio system. More particularly, context 
module A 22 and context module B 24 supply navigation- 
related context information to natural language parser 14. 
More specifically, context module A 22 provides context 

30 information to support operation of navigation system 28. 
Navigation system 28 provides directions and other naviga- 
tion information to user 8. Context module A 22 represents 
a navigation module such as a map database stored within 
the vehicle or downloaded via a telecommunication connec- 

35 tion. Context module B 24 also provides navigation infor- 
mation from an alternate source, such as a global positioning 
system (GPS) receiver. Similarly, context module C 26 
provides information to natural language parser 14 for 
facilitating identification of keywords and phrases from the 
recognized speech such as for an audio system 30. Audio 
system 30 may comprise one or a combination of radio, 
cassette tape deck, compact disk player, or multi-compact 
disk player. 

Returning to mapping module 20, mapping module 20 

45 downloads the electronic context information from context 
modules 22, 24, 26 into grammars 16 for use by natural 
language parser 14. Mapping module 20 has a priori knowl- 
edge of the overall structure of the devices, modes and 
commands downloaded from context modules 22, 24, 26. 

50 Mapping module 20 would thus be aware that context 
modules 22, 24, 26, provide information on both navigation 
and audio. Mapping module 20 then uses this a priori 
knowledge to map the information into grammars 16. 
From time to time, a context module or system controlled 

55 through natural language interface 10 may change. Upon 
such an occurrence, natural language interface 10 must 
accommodate such a change by including a mapping module 
update r 32. Mapping module updater 32 receives update 
information over one or a number of the Internet, a tele- 

60 communication link, or directly from a newly added context 
module. If the overall structure of context information 
provided by context modules 22, 24, 26 changes so that 
mapping module 20 no longer correctly maps context infor- 
mation into grammars 16, mapping module updater 32 

65 updates mapping module 20. 

In a particular aspect of the subject invention, natural 
language interface 10 includes a dialog manager 34 which 
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generates commands to a navigation controller 36 which in 
turn generates control commands for navigation system 28 
or audio system 30. As described above, navigation system 
28 may be represented as a GPS receiver or other radio 
navigation device, a dead reckoning system, a mapping and 
direction system or the like. Dialog manager 34 generates 
control requests to navigation controller 36 which in turn 
generates control commands to navigation system 28. Such 
commands include setting desired start points, destination 
points, intermediate points, as well as requesting various 
useful navigation information. Similarly, dialog manager 34 
generates control commands to audio controller 38 which in 
turn generates control commands to audio system 30. Audio 
controller 38 may generate commands to control operation 
or request information from the audio components, includ- 
ing present radio station, present order of play of compact 
disk player, volume, audio levels and the like. 

In some situations, the user 8 does not provide sufficient 
information for dialog manager 34 to generate control 
requests to navigation controller 36 or audio controller 38. 
In such situations, dialog manager 34 utilizes the output of 
natural language parser 14 to capture the user's requests so 
that command requests can be properly generated to navi- 
gation controller 36 or audio controller 38. Dialog manager 
34 then generates control commands to navigation controller 
36, which in turn generates control commands for naviga- 
tion system 28. Similarly, after refining the user's request, 
dialog manager 34 generates control commands to audio 
controller 38 which in turn generates control commands to 
audio system 30. 

In some situations, even with context information, the 
user does not provide sufficient information for dialog 
manager 34 to generate control requests to navigation con- 
troller 36 or audio controller 38. In such situations, dialog 
manager 34 generates speech commands to speech synthe- 
sizer 40 and/or on-screen display 42 to prompt the user for 
additional information or clarification of existing informa- 
tion. Speech synthesizer 40 preferably utilizes a frame -text 
to speech system, which is a system where a sentence to 
-synthesize includes a fixed part and variable slots, in order 
to synthesize inquiries output by dialog manager 34. 

By virtue of utilizing a semantic representation of the 
context information, natural language interface 10 performs 
a filtering of the information contained in context modules 
22, 24, 26. Further, dialog manager 34 operates in conjunc- 
tion with a profile data store 44. Profile data store 44 
contains user profile information. Such information may 
include, with respect to navigation, recent geographical 
locations where the user has operated the vehicle or has 
requested directions. With respect to the audio system, such 
information may include radio system presets, musical 
selection from a compact disk player, audio system volume, 
or other tonal controls. Data profile store 44 contains data for 
voice identification techniques or adaptive recognition. 
Further, in certain modes, identification of particular users 
may enable dialog manager 34 to preset any and all vehicle 
systems to predefined user preferences for any and all 
vehicle systems interconnected to dialog manager 34. 

By way of an example for operating a vehicle navigation 
system, natural language parser 14 may define a semantic 
frame associated with each command. A semantic frame 
includes slots for a geographical location, such as may be 
defined by zip code, intersection of two roads, a local 
landmark or point of interest, or other predefined location. 
One or several of these slots must be defined for the frame 
to be activated. The user may fill the semantic frame using 
natural speech. For example, the user may input "I am now 
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on State Street and I want to go to Santa Barbara Street." By 
analyzing the sentence and understanding key phrases such 
as "now" and "I want to go to", natural language interface 
10 may automatically determine a start and end point for 

5 input to navigation system 28 via navigation controller 36. 
Suppose that more than two State Streets exist in a city, by 
virtue of input from context modules 22 and 24, natural 
language interface 10 may detect the ambiguity, and dialog 
manager 34 may output an inquiry through speech synthe- 

10 sizer 40. Such an inquiry may seek to determine whether the 
user is near a particularly well known landmark in the 
vicinity of one of the State Streets. Based on the response, 
dialog manager 34 generates the particular commands. 
Alternatively, if one of the context modules provides GPS 

as information, such information may be utilized to determine 
on which of the two State Streets the user is traveling. 

FIG. 2 depicts components of natural language interface 
10 in greater detail. In particular, speech understanding 
module 50 includes a local parser 52 to identify the 

20 predetermined, relevant task-related fragments. Speech 
understanding module 50 also includes a global parser 54 to 
extract the overall semantics of the request of the user. 

Local parser 52 utilizes in a preferred embodiment small 
and multiple grammars along with several passes and a 

25 unique scoring mechanism to parse hypotheses. For 
example, local parser 52 recognizes according to this 
approach phrases such as addresses, intersections, 
landmarks, zip codes and the like with respect to navigation, 
and music tracks, tonal controls, and the like, with respect to 

30 audio systems. If the speaker utters "I am know on State 
Street and I want to go to Santa Barbara Street", the local 
parser recognizes "State Street" and "Santa Barbara Street" 
as locations, and extracts this semantic information. Global 
parser 54 assembles these items in the context of the entire 

35 sentence and recognizes that the speaker wishes to go from 
State Street, which is the present location, to Santa Barbara 
Street, which is the target location. 

Speech understanding module 50 includes knowledge 

4Q database 56 which encodes the semantics of a domain. In 
this sense, knowledge database 56 is preferably a domain - 
specific database as depicted by reference numeral 58, and 
is utilized by dialog manager 34 to determine whether a 
particular action related to achieving a predetermined goal is 

45 possible. 

The preferred embodiment encodes the semantics via a 
frame data structure 62. Frame data structure 62 contains 
empty slots 64 which are filled when the semantic interpre- 
tation of global parser 54 matches the frame. For example, 
50 a frame data structure, whose domain is navigation 
commands, includes empty slots for specifying the start and 
end location. If user 8 has provided a proper start and end 
location, then the empty slots are filled with this informa- 
tion. However, if that particular frame is not completely 
55 filled after user 8 has initially provided speech input, dialog 
manager 34 instructs computer response module 68 to ask 
user 8 to provide the remaining information, whether the 
remaining information is the start or end location. 
The frame data structure 62 preferably includes multiple 
60 frames each of which in turn has multiple slots. One frame 
may have slots directed to specific attributes of navigation, 
such as start and end points, distance to predetermined 
points, and the like. Other frames may have attributes 
directed to various aspects of audio system control, includ- 
es ing station presets, CD selection, and tonal selection. The 
following reference discusses local and global parsers and 
frames: R Kuhn and R. D. Mori Spoken Dialogs With 
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Computers (Chapter 14: Sentence Interpretation), Academic mine whether a particular action or goal of the user is 

Press, Boston (1998). feasible to assist the user to accomplish this goal. 

Dialog manager 34 uses dialog history data file 70 to Natural language parser 14 analyzes and extracts seman- 

assist in filling empty slots before requesting user 8 for licaU V important and meaningful topics from a loosely 

specific information. Dialog history data file 70 contains a 5 structured natural language text which may have been 

log of conversations through the natural language interface f?™** as the 0U !P* of » f ulomatlc s P? ech recognition 

* ~ i -r u i . . «i e. . (ASR) system used by a dialog or speech understanding 

10. For example, if the speaker staKs "I am now on State I £ * ^ * * u ^ ^ 

Street and I want to go to Santa Barbara Street, dialog J ^ m % * J w represeQlation by generating 

manager 34 examines the dialog history file 70 to determine we U-structured tags containing topic information and data, 

what start and end locations user 8 has already selected or ™ and ^^1^ eacQ tag witn lhe segments of the input text 

rejected in previous dialog exchange. If user 8 has previ- containing the tagged information. In addition, tags may be 

ously selected a State Street in a, for example, northern generated in other forms such as a separate list, or as a 

section of the city, dialog manager 34 fills the empty slot semantic frame. 

with the start location with that particular State Street. If a Preferably, natural language parser 14 includes a robust 

sufficient number of slots have been filled, natural language 15 ^ ^ Qf grammatically mcorrect 

interface 10 will ask user 8 to verify and confirm the English sentences, due to the following reasons: the input to 

program selection. Thus, if any assumptions made by dialog me recognizer is caS ual, dialog style, natural speech that can 

manager 34 through use of dialog history data file 70 prove coatain broken serjtences , partial phrases, and the insertion, 

to be incorrect, the speaker can correct the assumption. omission, or mis-recognition of errors by speech recognizer 

Preferably, computer response module 68 is multi- modal 12, even when the speech input is considered correct, 

and provides a response to user 8 via speech synthesis, text Natural language parser 14 deals robustly with all types of 

or graphical. For example, if user 8 has requested directions mpu t a nd extracts as much information as possible, 

to a particular location, computer response module 68 could $\GS. 3a-3b depict operations steps associated with the 

display a graphical map with the terms spoken by the user dialog speech processing system of FIG. 2. FIGS. 3a-3b will 

displayed on the map after being formatted by format be described with respect to control of a navigation system, 

module 72. Moreover, computer response module 68 can 0 ne skilled in the art will recognize that these operations 

speak the directions to the user using speech synthesis. In may apply equa Uy t0 an audio system. Start block 80 

one embodiment, computer response module 68 uses the indicates that process block 82 is to be processed. At process 

semantics that have been recognized to generate a sentence ^ block 32, the user speaks to the device of the present 

based on the semantic concept. Alternatively, sentences are invention about being at a present location and desiring to go 

automatically generated based on per type sentences which t0 a target location. At process block 84, the user's speech 

have been constructed from slots available in a semantic ^ rec0 g n ized by the present invention, and at process block 

frame. However, one skilled in the art will recognize that the 86> predetermined words or phrases of the user's speech are 

present invention is not limited to having all three modes ^ rec0 gnized f such as phrases about start or end locations, 

present, as it can contain one or more of the modes of the Control Qext proceeds l0 proccss block 88t pr 0cess block 

computer response module 68. 88 determines the semantic parts of the user's speech by 

In another embodiment, dialog manager 34 instructs utilizing the local and global parser. Control then proceeds 

computer response module 68 to perform a search on remote to process block 90 which populates the proper frames with 

database 74 in order to provide user 8 with timely traffic 4Q the determined semantic parts of the user's speech. Control 

information about routes between the start or end locations. then proceeds to continuation block. A 92, which leads to 

Remote database 74 can perform communications with FIG. 3b. 

dialog manager 68 through conventional methods, such as with re f erenC e to FIG. 3fc, decision block 94 determines 

via a radio frequency communication mode. This alternative whether a sufficient number of slots have been populated to 

embodiment substantially improves the dialog between user 45 contro i the automobile device(s). If a suflScient number of 

8 and dialog manager 34 by providing information to user 8 slots have 5een p0 p U ] at ed to control the device(s), control 

so that user 8 can formulate an improved request through proceeds to process block 96 which generates commands to 

natural language interface 10. control the automobile device(s). Control then proceeds to 

Dialog manager 34 assumes an integral roll in the dialog process block 98 where dialog manager vocalizes the result 

by performing a back-and-forth with user 8 before initiating 50 of the command to the user. After vocalization of the result, 

a command request to navigation controller 36 or audio processing terminates at end block 100. 

controller 38. In such a roll, dialog manager 34 utilizes if decision block 94 determines that an insufficient num- 

teachings of the present invention to effectively manage the ber of slots have been populated to control the automobile 

turn-taking aspects of human-like back-and-forth dialog. device(s), process block 101 attempts to fill any missing 

Dialog manager 34 is able to make its own decision about 55 s i 0 ts with information from a context module search. For 

which direction the dialog with user 8 will take next and example, if the user has specified a start destination, but has 

when to initiate when a new direction. not provided a starting point, the present invention queries 

For example, if user 8 has requested to go from a information provided by the context modules in order to 

particular start point to a particular end point, dialog man- determine possible start points. If necessary, control pro- 

ager 34 determines whether such a start point or end point 60 ceeds to process block 102 which attempts to fill any missing 

prove logical given the context information given by context slots with information from the dialog history file. Process 

modules 22, 24. Such a determination may be made based on block 104 constructs an inquiry to the user regarding the 

input from context module A 22 or context module B 24. In missing slots which have not yet been filled. Process block 

this example, if dialog manager 34 determines that such a 106 performs speech synthesis of the constructed inquiry, 

start location is not logical, however, dialog manager 34 65 and at process block 108, the user responds with the infor- 

selects a more likely, alternative start location, based on GPS mation. Control then proceeds, via continuation block 110, 

positioning information. Thus, dialog manager 34 can deter- back to recognized user's speech 84. 



08/04/2004, EAST version: 1.4.1 



US 6,598, 

7 

While the invention has been described in its presently 
preferred form, it is to be understood that there are numerous 
applications and implementations for the present invention. 
Accordingly, the invention is capable of modification and 
changes without departing from the spirit of the invention as 5 
set forth in the appended claims. 

What is claimed is: 

1. An apparatus for providing a natural dialog interface for 
a device installed on an automobile, comprising: 

a speech recognizer, the speech recognizer recognizing 10 
input speech provided by a user; 

a speech understanding module connected to the speech 
recognizer, the speech understanding module determin- 
ing semantic components of the input speech_using a 
set of grammars; 15 

a mapping module updating the set of grammars using at 
least one context module, the context module providing 
information about the devices; 

a dialog manager connected to the speech understanding 2 rj 
module, the dialog manager doing at least one of 
determining a condition of insufficient semantic infor- 
mation existing within the input speech based upon the 
determined semantic components and for providing 
information to the user about the device in response to 2 s 
the input speech. 

2. The apparatus of claim 1 further comprising a context 
module, the context module providing information to the 
speech understanding module to assist with determining the 
semantic components of the input speech. 30 

3. The apparatus of claim 1 further comprising a plurality 
of context modules, each context module providing infor- 
mation to the speech understanding module to assist with 
determining the semantic components of the input speech. 

4. The apparatus of claim 1 further comprising a device 35 
controller, the device controller receiving commands from 
the dialog manager and generating control commands to 
operate the device, 

5. The apparatus of claim 1 wherein the dialog manager 
includes a speech synthesizer, the speech synthesizer pro- 40 
viding the user with synthesized speech information about 
available selections. 

6. The apparatus of claim 1 wherein the speech under- 
standing module is a goal-oriented speech understanding 
module defining a plurality of goal-oriented frames having 45 
slots corresponding to control commands output by the 
device controller. 

7. The apparatus of claim 1 wherein the speech under- 
standing module is a natural language speech understanding 
module having a set of predefined grammars that correspond 50 
to control commands output by the device controller. 

8. The apparatus of claim 1 wherein the dialog manager 
includes a user profile database for storing a representation 
of past use by a user of the apparatus, and wherein the dialog 
manager utilizes the profile database. 55 

9. The apparatus of claim 8 wherein the profile database 
includes at least one of data for voice identification and data 
for adaptive voice recognition. 
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10. An apparatus for providing a natural dialog interface 
for an automobile navigation system, comprising: 

a speech recognizer, the speech recognizer recognizing 
input speech provided by a user; 

a speech understanding module connected to the speech 
recognizer, the speech understanding module determin- 
ing semantic components of the input speech_using a 
set of grammars; 

a mapping module updating the set of grammars using at 
least one context module, the context module providing 
information about the devices; 

a dialog manager connected to the speech understanding 
module, the dialog manager doing at least one of 
determining a condition of insufficient semantic infor- 
mation for controlling the navigation system existing 
within the input speech based upon the determined 
semantic components and for providing information to 
the user about the navigation system in response to the 
input speech. 

11. The apparatus of claim 10 further comprising a 
plurality of context modules, each context module providing 
navigation information to the speech understanding module 
to assist with determining the semantic components of the 
input speech. 

12. The apparatus of claim 10 further comprising a 
context module, the context module providing navigation 
information to the speech understanding module to assist 
with determining the semantic components of the input 
speech. 

13. The apparatus of claim 12 further comprising a 
navigation controller, the navigation controller receiving 
commands from the dialog manager and generating control 
commands to operate the navigation system. 

14. The apparatus of claim 13 wherein the dialog manager 
includes a speech synthesizer, the speech synthesizer pro- 
viding the user with synthesized speech information about 
the navigation system. 

15. The apparatus of claim 14 wherein the speech under- 
standing module is a goal-oriented speech understanding 
module defining a plurality of goal-oriented frames having 
slots corresponding to control commands output by the 
navigation controller. 

16. The apparatus of claim 15 wherein the speech under- 
standing module is a natural language speech understanding 
module having a set of predefined grammars that correspond 
to navigation control commands output by the navigation 
controller. 

17. The apparatus of claim 16 wherein the dialog manager 
includes a user profile database for storing a representation 
of past use by a user of the apparatus, and wherein the dialog 
manager utilizes the profile database. 

18. The apparatus of claim 10 wherein the profile database 
includes at least one of data for voice identification and data 
for adaptive voice recognition. 

* * * * * 
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